A dissimilarity matrix between protein atom classes based on Gaussian mixtures

نویسندگان

  • Ville-Veikko Rantanen
  • Mats Gyllenberg
  • Timo Koski
  • Mark S. Johnson
چکیده

MOTIVATION Previously, Rantanen et al. (2001; J. Mol. Biol., 313, 197-214) constructed a protein atom-ligand fragment interaction library embodying experimentally solved, high-resolution three-dimensional (3D) structural data from the Protein Data Bank (PDB). The spatial locations of protein atoms that surround ligand fragments were modeled with Gaussian mixture models, the parameters of which were estimated with the expectation-maximization (EM) algorithm. In the validation analysis of this library, there was strong indication that the protein atom classification, 24 classes, was too large and that a reduction in the classes would lead to improved predictions. RESULTS Here, a dissimilarity (distance) matrix that is suitable for comparison and fusion of 24 pre-defined protein atom classes has been derived. Jeffreys' distances between Gaussian mixture models are used as a basis to estimate dissimilarities between protein atom classes. The dissimilarity data are analyzed both with a hierarchical clustering method and independently by using multidimensional scaling analysis. The results provide additional insight into the relationships between different protein atom classes, giving us guidance on, for example, how to readjust protein atom classification and, thus, they will help us to improve protein--ligand interaction predictions. CONTACT [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling high-level information by using Gaussian mixture correlation for GMM-UBM based speaker recognition

The Gaussian mixture model-universal background model (GMM-UBM) has been dominant in text-independent speaker recognition tasks. However the conventional GMM-UBM method assumes that each Gaussian mixture is independent and ignores the fact that within Gaussian mixtures, there do exist some useful high-level speaker-dependent characteristics, such as word usage or speaking habits. Based on the G...

متن کامل

Adaptive Bayesian multivariate density estimation with Dirichlet mixtures

We show that rate-adaptive multivariate density estimation can be performed using Bayesian methods based on Dirichlet mixtures of normal kernels with a prior distribution on the kernel’s covariance matrix parameter. We derive sufficient conditions on the prior specification that guarantee convergence to a true density at a rate that is minimax optimal for the smoothness class to which the true ...

متن کامل

Complexity of Dissimilarity Based Pattern Classes

If the proper dissimilarity measures are provided, some sets of objects, e.g. curves or blobs, may be better described by using representation sets instead of features. The dissimilarity matrix of such a set is the base for further analysis. The question arises how from a given dissimilarity matrix can be judged whether the size of the training set is sufficient to describe the peculiarities of...

متن کامل

A Dtw-based Dissimilarity Measure for Models and Its Application to Wo

We propose a dynamic time-warping (DTW) based distortion measure for measuring the dissimilarity between pairs of left-to-right continuous density hidden Markov models with state observation densities being mixture of Gaussians. The local distortion score required in DTW is defined as an approximate Kullback-Leibler divergence (KLD) between two Gaussian mixture models (GMMs). Several approximat...

متن کامل

Methods for merging Gaussian mixture components

The problem of merging Gaussian mixture components is discussed in situations where a Gaussian mixture is fitted but the mixture components are not separated enough from each other to interpret them as “clusters”. The problem of merging Gaussian mixtures is not statistically identifiable, therefore merging algorithms have to be based on subjective cluster concepts. Cluster concepts based on uni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 18 9  شماره 

صفحات  -

تاریخ انتشار 2002